Introduction to Data Analysis and Graphical Presentation in Biostatistics with R by Thomas W. MacFarland

Introduction to Data Analysis and Graphical Presentation in Biostatistics with R by Thomas W. MacFarland

Author:Thomas W. MacFarland
Language: eng
Format: epub
Publisher: Springer International Publishing, Cham


5.4 Conduct a Visual Data Check

With the data in proper format, it would be common to immediately attempt the appropriate inferential analyses, Oneway ANOVA for this lesson. However, it is best to first prepare a few graphical displays of the data and to then reinforce comprehension of the data with descriptive statistics and measures of central tendency.

The summary() function, min() function, and max() function are all certainly useful for data checking, but there are also many advantages to a visual data check process. In this case, simple plots can be very helpful in looking for data that may be either illogical or out-of-range. These initial plots will be, by design, simple and should be considered throwaways as they are intended only for initial diagnostic purposes. More complex figures, often of publishable quality can then be prepared from these initial throwaway graphics, by careful selection of functions and arguments.

Although the emphasis in this lesson is on Oneway ANOVA for the factor-type object variable F2b (five breakout groups) and the numeric-type object variable M1 (values can range from 0.00 to 600.00), a simple graphic will be prepared for each variable, largely as a quality assurance check against the entire dataset. Experienced researchers practice quality assurance in multiple ways and at multiple times.

names(BioSpmen.df) # Confirm all object variables.

par(ask=TRUE)

plot(BioSpmen.df$ID,

main="BioSpmen.df$ID Visual Data Check")

par(ask=TRUE)

plot(density(BioSpmen.df$M1,

na.rm=TRUE), # Required for the density() function

main="Density Plot of M1",

lwd=6, col="red", font.axis=2, font.lab=2)

par(ask=TRUE)

plot(density(BioSpmen.df$M2,

na.rm=TRUE), # Required for the density() function

main="Density Plot of M2",

lwd=6, col="red", font.axis=2, font.lab=2)

par(ask=TRUE)

plot(density(BioSpmen.df$M3a,

na.rm=TRUE), # Required for the density() function

main="Density Plot of M3a",

lwd=6, col="red", font.axis=2, font.lab=2)

par(ask=TRUE)

plot(density(BioSpmen.df$M3b,

na.rm=TRUE), # Required for the density() function

main="Density Plot of M3b",

lwd=6, col="red", font.axis=2, font.lab=2)

par(ask=TRUE)

plot(density(BioSpmen.df$M3c,

na.rm=TRUE), # Required for the density() function

main="Density Plot of M3c",

lwd=6, col="red", font.axis=2, font.lab=2)

par(ask=TRUE)

plot(BioSpmen.df$F1.recode,

main="BioSpmen.df$F1.recode Visual Data Check")

par(ask=TRUE)

plot(BioSpmen.df$F2a.recode,

main="BioSpmen.df$F2a.recode Visual Data Check")

par(ask=TRUE)

plot(BioSpmen.df$F2b.recode,

main="BioSpmen.df$F2b.recode Visual Data Check",

col=gray(4:0/4))

# Note the gray shading scheme and how the values range

# from 0 to 4 and not 1 to 5. Counts that begin with 0

# represent a common means of counting in computer science.

The purpose of these initial plots is to gain a general sense of the data and to equally look for outliers. In an attempt to look for outliers, the ylim argument has been avoided, so that all data are plotted. Extreme values may or may not be outliers, but they are certainly interesting and demand attention.

This sample lesson has been designed to look into the nature of the numeric-type object variable M1 and the factor-type object variable F2b. Given the nature of M1 values, it may also be a good idea to supplement the plot(density()) function with the hist() function and the boxplot() function, to gain a another view of the continuous values for this object variable. Although object variable M1 does not show perfect normal distribution along a bell-shaped curve, it is assumed that the distribution of M1 approximates those conditions needed for correct use of Oneway ANOVA. Then other functions used in the lattice package and the sm package may have potential use, to further explain how data are organized.

par(ask=TRUE)

hist(BioSpmen.df$M1,

main="BioSpmen.df$M1 Visual Data Check (Histogram)",

font=2, # Bold text

cex.lab=1.15, # Large font

col="red") # Vibrant color

par(ask=TRUE)

boxplot(BioSpmen.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.